perm filename CHPT.7[LSP,JRA] blob
sn#316167 filedate 1977-11-07 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00016 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 .SEC(Storage#Structures and#Efficiency,,,P210:)
C00009 00003 .SS(Vectors and Arrays,,P137:)
C00022 00004 An implementation of an array facility in LISP might include
C00027 00005 .SS(Strings and Linear LISP,string processor,P27:)
C00039 00006 .SS(A Compacting Collector for LISP,,P291:)
C00052 00007 .SS(Bit-tables,,P263:)
C00055 00008 .SS(Representations of Complex Data Structures,,P138:)
C00064 00009 .SS(%3rplaca%* and %3rplacd%*,,P171:)
C00075 00010 .SS(Applications of ¬3rplaca¬* and ¬3rplacd¬*,,P155:)
C00088 00011 .<<NUMBERS>>
C00098 00012 .SS(Stacks and Threading)
C00105 00013 .SS(A Non-recursive %3read%*)
C00116 00014 .SS(More Applications of Threading)
C00119 00015 .SS(Storage Management and LISP,,P224:)
C00135 00016 .SS(Hash Techniques,,P248:)
C00141 ENDMK
C⊗;
.SEC(Storage#Structures and#Efficiency,,,P210:)
.SS(Introduction)
.FP
This chapter reconciles some of the generality of LISP with the
realities of contemporary data structures and machine organization.
Though any algorithm can be coded in terms of
manipulations of binary trees, often there are more
efficient organizations of data.
For example, our numerical algorithms could be expressed as list algorithms
using %3(#)%1, %3((#))%1, %3((#)#(#))%1, and so on, as representations
for %30%1, %31%1, %32%1, respectively.
Most machines supply hardware arithmetic representations and operations,
making such list representations unnecessary.
At the next level of data organization are vectors and arrays of numerals.
These
data structures could also be stored in a list-structure format
and individual components could be accessed by %3car-cdr%* chains.
However, most
machines have a hardware organization which can be exploited to
increase accessing efficiency over the list representation.
Sequential storage for elements, often coupled with hardware index registers
for fast access to elements, makes a more effective representation.
Similarly, strings can be represented as lists of characters. The
string processing operations are expressible as LISP algorithms;
again, this is usually not the most reasonable representation.
Some machines supply special hardware aids for string operations.
Even at
the level of list-structure operations, simple binary trees might not
be the most expeditious representation. Also many
of the algorithms we have presented in LISP are overly wasteful of
computation time.
There are no general rules for selecting data representations and
chosing programming style. Efficiency must be balanced against generality.
The chosen representation
must match the machine and the problem domain being studied. If
the problem is strictly numerical, then list-structure is overly general.
If simple string manipulation is sufficient,
then list processing also may be too general. There are many applications
of list processing which are so sufficiently well#behaved that
complex devices like garbage collectors are unnecessary.
However,
understanding the programming art in
a rich environment such as LISP, prepares the programmer
to apply these
techniques in a meaningful way. Many times a
representation in LISP is all that is needed; a "throw-away
implementation" may answer the question. A clean
representation with comprehensible algorithms is developed.
Once
%6a%* representation is developed, it is easy to get better ones.
.SS(Vectors and Arrays,,P137:)
.BEGIN TABIT2 (10,40);CENTERIT;FILL;
.FP
%2Vectors%*.##Vectors, also known as one-dimensional arrays, are usually
stored sequentially in memory. Simple vectors are usually stored one
element to a memory location though this is not a necessity; for
example, a vector representation of a complex number may be
stored as pairs of cells.
If vectors of nonhomogeneous data modes are contemplated,
each element would include type information.
Also, we have seen a representation
of a stack as a (sequential) vector with access
made via a global pointer to the vector.
In any case,
most languages make some restrictions on the behavior of vectors such
that efficient accessing of elements can be made.
Vectors are an attractive representation when the size of data objects
will not vary. Given such a static behavior, machines can perform
access and updating of the elements rapidly.
.pt24;
.FP
%2Arrays%*.##Arrays are vectors which allow vectors as elements.
For example, a two-dimensional array is a vector, whose elements
are vectors of individuals. We will restrict attention to
array whose elements are all of the same dimensions; efficient
representation of more general
arrays, called %2ragged arrays%1, will be examined in {yonss(P224)}.
We will restrict our attention further to
two-dimensional arrays, though
most of the discussion generalizes very
naturally.
Since
most machine memories are organized as linear devices, we map arrays onto a
linear representation.
A common implementation stores the array by rows; that
is, each row is stored sequentially - first, row 1; then row 2,#...
and so on.
A simple calculation finds the
location of an arbitrary element, A[i;j], given the location of
the first element A[1;1] and the length of each row of the
array. For an array A[1:M; 1:N],
.EQ;
loc[A[i;j]] = loc [A[1;1]] + (i-1)*N + (j-1)
.pt18;
.FP
In languages like Fortran which require that the size of the array be
known at compile-time the compiler can generate the accessing code
as references to specific memory locations.
Languages, like Algol 60 and some versions of LISP,
allow the size of an array to be determined at run-time.
Algol 60, for example, requires the declaration of the
type (real, boolean, etc.) of the array and specification of the number of
dimensions in the array, but the
size specification of each dimension can be postponed until run-time.
To implement this flexibility,
a %2dope vector%1 is introduced.
A ⊗→dope vector↔← is a header or %2descriptor%1 associated with
the area containing the actual array elements. The information in the dope vector
tells the functions which access the array how to treat the data.
Type and dimensionality are typical entries in dope vectors.
The compiler can determine the size of
the dope vector, but cannot determine its contents.
The dope vector is filled in
when the array declaration is executed;
at that time the array bounds are known.
The
compiler cannot allocate space for the array elements; the
allocation must be done at run-time.
At that
time we allocate space and complete the dope vector. All
references to array elements must use the dope vector.
Assume that
the array elements are stored by rows. Look at the calculation of
the location of element A[i;j]. For specific execution of an array
declaration much of this information is constant; the location of
the array elements, in particular, A[1;1] and the number of columns
N are fixed.
.Group;
Thus we rewrite the calculation as:
.PT18;
\###constant part\variable part
\[loc [A[1;1]]-N-1]#######+\######(i*N+j)
.pt18;
The constant part is stored in the dope vector. When we wish to
reference an element A[i;j] we need only compute the variable part
and add it to the constant part.
.APART
The dope vector for A [1:M; 1:N] perhaps might contain
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
⊂ααααααααααααααα⊃
~ 2 ~
εααααααπααααααααλ
~ 1 ~ M ~
εααααααβααααααααλ
~ 1 ~ N ~
εαααααα∀ααααααααλ
~ constant part ~
εαααααααααααααααλ
~ . . . ~
array elements
~ . . . ~
%ααααααααααααααα$
.END
.BOXB
.FP
There is another scheme for storing arrays which is used in
some of the Burroughs machines#(⊗↑[Org#71]↑,#⊗↑[Dor#76]↑).
Each row is stored sequentially and
access to separate rows is made through a device called a
%2⊗→mother-vector↔←%*. The mother-vector is a vector of pointers to the
rows.
.GROUP;
Thus,
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "[" FOR "\";
.TURN OFF "\";
.TABS 57,66;
.KRK
⊂ααααααα⊃ ⊂αααααααααααααααααα [αααα[⊃
~ #ααβααα→~ A∞411∞* A∞412∞* ...[A∞41N∞*[~
εαααααααλ %αααααααααααααααααα [αααα[$
~ #ααβαα→ . . .
εαααααααλ
. . .
εαααααααλ ⊂αααααααααααααααααα [αααα[⊃
~ #ααβααα→~ A∞4M1∞* A∞4M2∞* ...[A∞4MN∞*[~
%ααααααα$ %αααααααααααααααααα [αααα[$
.END
.BOXB;
.APART
Notice that the accessing computation is very inexpensive.⊗↓However access
by array columns can be expensive. If each row is on a separate
page in the machine, the access overhead can be substantial.←
On a virtual memory machine
this array organization can be helpful;
all rows need not be in memory at once.
If an access to a row not in
core is made, a "page#fault" is raised; the monitor brings the row
into memory and the computation continues. The mother-vector scheme
generalizes to multidimensional arrays, and can also be used in
conjunction with a dope vector.
.END
An implementation of an array facility in LISP might include
a declaration:
.DEF
⊗→%3array%*↔←%3[%1<identifier>;<type>;<bounds>; ... ;<bounds>], where the
identifier names the array; the type could be numeric or S-expr; and finally
a declaration of upper and lower bounds for each dimension would be needed.
%3array%* is a special form whose effect is to make the array name a %3SUBR%*,
whose code is the calculation of the dope vector. Thus,
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
⊂ααααααπαααα⊃ ⊂ααααααααααααα⊃
~ SUBR ~ #αβ→αα→~ dope vector ~
%αααααα∀αααα$ ~ calculation ~
εαααααααααααααλ
~ array ~
~ elements ~
. . .
%ααααααααααααα$
.END
.BOXB
.FP
If we are to store S-exprs in the array, then the ⊗→garbage collector↔← must
be able to mark the entries. This is the reason for including type information.
When an array element is to be referenced, the subscripts are evaluated
(since the array name was declared as a %3SUBR%*) and the dope vector code
is executed. That computation results in a reference to the appropriate cell.
.GROUP;
We also must be able to store information in the array.
.DEF
%3store[%1<name>[<subscr>; ... ;<subscr>];<value>] : ⊗→%3store%*↔← is a special form
whose effect is to store the value of <value> in the designated array element.
.APART
We have discussed storage allocation and accessing of array elements.
Important distinctions in language design appear in discussing
deallocation of array space. Typical Algol-like languages impose
a stack discipline on storage management. This imples that the arrray
elements may be allocated in the run-time stack. It also implies that
the elements become inaccessible once the block which allocated
that array has been exited. This is implemented by popping the array from
the stack. There are two ways for a language to
assure that no references to "inaccessible" elements can occur
(such references are called %2dangling references%1.)
Either restrict the semantics of the language such that no
such references can occur (Algol 60), or allow constructs which %6may%1
cause dangling references, but declare any
occurrence to be an error (Algol#68).
LISP-like languages suppose that data structures
are to be retained as long as they are accessible; that treatement is
also given to LISP arrays. Therefore arrays are allocated and deallocated
in a manner similar to the %3cons%1 operation for S-exprs; sequential blocks
are maintained in a free list; we will say more about this in {yonss(P224)}.
The two management philosophies for deallocation of data structures
are characterized as the %2deletion strategy%1 and the %2retention strategy%1;
see ⊗↑[Ber#71]↑.
.CENT(Problem)
.NL
1.##Implement a stack in LISP first using lists or dotted pairs, then using
an array. Include implementations of the stack operations.
.SS(Strings and Linear LISP,string processor,P27:)
.BEGIN TURN ON "←";
.FP
On {yon(P216)} we discussed one representation for LISP print names:
a linked list of full words; each
full word contained a segment of the
atom name. Print names are a special instance of a data structure
named strings; our use of strings in LISP has been restricted to
manipulating string constants. In this section we will discuss alternative
representations for strings, and discuss further operations on string objects.
Most production LISP systems have a comprehensive set of string operations.
As with numbers and vectors, string operations could easily be
represented as operations on S-exprs; however it is frequently more efficient
to represent strings as a separate abstract data structure.
Each string object is a sequence of characters. The elements
of a string may not be strings; this is the essential difference
between sequences and strings.
That simplification of data structures introduces some different
aspects of storage management. It is these issues which we will emphasize
in this section.
The primitive string manipulations --#constructors selectors, recognizers,
and equality#-- are similar to those for sequences. Therefore we use
LISP M-expression syntax when describing the operations;
for that reason we call the string language, %2linear LISP%1.
The implementation
allows the creation of strings
of arbitrary length; it allows the generation of new
strings and the decomposition of existing strings. Since
arbitrary length strings are to be created, an organization similar
to free space will be used. The storage area for strings will be called
%2string space%1.
String space is a linear sequence of cells; each cell can
contain one character. A string will be represented as a
sequence of contiguous character cells.
The value of a string variable will be
represented as a pair, containing character count and a pointer to the beginning
of the character sequence in string space.
.GROUP;
Thus,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.KRK
⊂αααπααα⊃
~ 3 ~ # ~
%ααα∀αβα$
~
%α→ααα→⊃
↓
. . .ααααπαααπαααπαααπαααπαααπααα . . .
A B B 1 D . . . string space
. . .αααα∀ααα∀ααα∀ααα∀ααα∀ααα∀ααα . . .
.BOXB
.END
.fp
encodes the string %3ABB%*.
.APART
.BEGIN TURN OFF"←";
.fp
There are two primitive selector functions: ⊗→%3first%*↔← and
⊗→%3rest%*↔←.
.END
.GROUP;
.DEF
%3first[x]%* is the first
character of the string represented by %3x%*. %3first%* is undefined for the
empty string,#%7e%1. For example,
.EQ
%3first[ABC]%1 is %3A; first[%7e%3]%1 = %9B%1
.APART
.GROUP;
.DEF
%3rest[x]%* is the string of characters which remains when the first
character of the string is deleted. %3rest%* is also undefined for the
empty string. For example,
.EQ
%3rest[ABC]%* is %3BC%*
.boxb
.APART
.GROUP;
.fp
There is one constructor primitive.
.DEF
%3concat[x;y]%* creates a new string. %3x%* is a character; %3y%* is
a string. %3concat%* forms a string consisting of the concatenation of
%3x%* with %3y%*. For example,
.EQ
%3concat[A;BC]%* is %3ABC%*
.APART
.boxb
.GROUP;
.fp
There are two primitive recognizers:
.boxa
.begin nofill;
.krk
⊗→%3char%*↔←%3[x]%1: is %3x%* a single character?
.pt2
⊗→%3null%*↔←%3[x]%1: is %3x%* the empty string?
.END
.APART
.GROUP;
.fp
.begin tabit1(27)
For example:\%3char[A] %1is %et%1
\%3char[AB] %1is %ef%1
.END
.pt18
.APART
.Fp
Finally, we include a version of the equality predicate which
will determine if two characters are identical.
.begin tabit1(22);
\%3x ⊗→%3=%*↔← y%1: are %3x%* and %3y%* the same character?
.pt2;pt2
\%3AB = AB %1is %9B%1
.end
.boxb
.FP
The implementation of these string primitives is less complex
than that of LISP primitives.
%3first%* generates a character count of one and a pointer to the
first character of the parent string.
%3rest%* generates a character count of one less than that of the
parent and a pointer to the second character of the parent string.
Therefore, this implementation shares substrings, just as %3car%1 and %3cdr%1
share substructure.
The implementation of the recognizers and the equality predicate is straightforward.
We will blur the distinction between characters
and strings of length one. Thus %3char%* need only check the character count.
%3null%* gives %et%1 if the count is zero.
To implement equality, we note that characters are not stored uniquely,
so we must make an actual character
comparison.
As with full LISP, the implementation of the constructor requires more
care. Since our implementation requires that string components be contiguous,
we must copy the arguments to %3concat%1.
To evaluate %3concat[x;y]%1, we copy %3x%*, then copy %3y%*
so that %3y%1 follows %3x%1 in free string space; we
generate a character count of one plus the count of %3y%*, and
generate a pointer to the copy of %3x%*. The copies are
made in the free string space in a manner similar to that used in %3cons%1.
The storage management is somewhat different from that of
a simple LISP implementation.
Since the copying operation within %3concat%1 allocate space, we must
include some method for deallocating space. Though simpler methods
may suffice we us a garbage collector.⊗↓Since string operations are quite
well-behaved, a reference counter could be used. We use a garbage collector
for its elegance and its
pedagogical value for the next section.←
The marking phase is much simpler than that for LISP; it is not
recursive. We use the
descriptor in the symbol table to mark each character string.
However, we cannot stop marking simply because we
have encountered a previously marked character. Since we are sharing
substrings, we must visit each character in the string item.
The sweep
phase needs to be more comprehensive for string collection. Since
strings are stored sequentially, a fragmented string
space is of little use. We must compact all the
referenceable strings into one end of string space, and free a linear
block for the new free string space. Since we are sharing substrings,
a little care must be exercised. When we move a string,
the descriptor of any variable referencing any part of that
parent string must be changed to reflect the new location. So before
we begin the relocation of strings, we sort the string descriptors on
the basis of their pointers into string space. We then recognize
each parent string, moving it down into freed locations and update
the address pointers in the descriptors of any substrings.
Eventually all strings will be compacted; the
string space pointer can be set and the computation continued.
Next, we adapt the
compacting garbage collectors for use in LISP.
.END
.SS(A Compacting Collector for LISP,,P291:)
.P173:
.FP
We can combine the simplicity of the original mark-sweep garbage collector
with the sophistication of the collection phase of string
garbage collector and
produce a compacting garbage collector for LISP.
There are several motivations for compacting storage. First, besides making the
%6active%* storage contiguous, we also make the %6free%* locations contiguous.
Thus the free lists can be handled as vectors rather than as lists.
This simplifies storage allocation: to allocate the next
free element, take the next element in the free space vector.
Another reason for
concern with compacting is related to hardware. If the underlying machine is
using a paging scheme, then we can try to minimize page-faults by keeping
the LISP structures localized. In the worst case, we could have every element
of a list on a separate page; this could require that the memory manager
retrieve a new page for every
reference.⊗↓Very little empirical work has been done on the actual storage
requirements and running environment of LISP.
A start is made in ⊗↑[Cl#76]↑; much more should be done.←
However,
we cannot restrict the operations of the LISP programmer. The underlying hardware
%2must%* be invisible to the user. The next best thing is to try to keep
the structures as local as possible; compaction of spaces is a first attempt
at this. We will discuss other lower-level tricks later.
Compaction is important in languages other
than LISP.
If the language allocates storage in a manner similar to LISP
but the constructs allow %6different%*-sized blocks to be specified
(a string processor is a simple example), then
compaction may be necessary.⊗↓As we shall soon see, the rationale is
applicable in LISP as well.←
Granted that compaction is a worthwhile endeavor, we proceed.
We can't simply mark all the
active cells and then move them into unmarked cells to compact the space.
We must also maintain the original topological relationships between the elements.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 45;
.KRK
.BOXA
⊂αααπααα⊃ ? ⊂αααπααα⊃
204 ~204~204~∞1 is not the same as ∞b ?100 ~204~204~
%ααα∀ααα$ ? %ααα∀ααα$
.BOXB
.END
Besides moving the cells, we must also update each reference to a moved location:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 45;
.KRK
.BOXA
⊂αααπααα⊃ ? ⊂αααπααα⊃
204 ~204~204~∞6 is∞1 the same as ∞b ?100 ~100~100~
%ααα∀ααα$ ? %ααα∀ααα$
.BOXB
.END
.FP
To handle these problems, we expand the sweep phase into two phases:
the %2relocating%* phase and the %2updating%* phase.
The relocating phase begins after all active structure is marked.
Assume we are to compact all active structure %6down%* to the
bottom of the space. First we initialize two pointers: a %2free%* pointer
to the lowest cell in the space; and an %2active%* pointer to the top of the space.
We move the active pointer down until we come across a %6marked%1 location;
we move the free pointer up until we locate an %6unmarked%* cell.
We want to move that marked cell down into the free location; but
we must also supply enough information to maintain the original relationships
in the transformed structure. The cell we move may reference other cells
which will be moved.
.BEGIN GROUP;
Here's a picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.krk
.BOXA
. . .
⊂αααπααα⊃
77 ~ 65~402~
%ααα∀ααα$
. . .
⊂αααπααα⊃
100 ~ ~ ~ ←∞1free pointer∞*
%ααα∀ααα$
. . .
⊂αααπααα⊃
155 ~ ~ ~
%ααα∀ααα$
. . .
⊂αααπααα⊃
204 ~402~ 77~
%ααα∀ααα$
. . .
⊂αααπααα⊃
402 ~204~402~ ←∞1active pointer∞*
%ααα∀ααα$
. . .
.BOXB
.END
.END
.FP
Cell %b77%* was active so we left it alone; it references cell %b65%*,
which has already been visited; and also references cell %b402%* which is about
to move. We move the contents of cell %b402%* into cell %b100%*, and to
let everyone know where the contents has gone, we leave a forwarding address of
%b100%* in location %b402%*.
.BEGIN GROUP;
Thus,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
⊂αααπααα⊃
100 ~204~402~ ←∞1free pointer∞*
%ααα∀ααα$
. . .
⊂αααπααα⊃
402 ~ ~100~ ←∞1active pointer∞*
%ααα∀ααα$
.BOXB
.END
.END
.FP
The active pointer, having writ, moves on;
it skips over any unmarked cells, looking for the next marked
location. Assume the next marked
location is %b204%*. It stops there and waits for the free pointer
to discover that location %b155%* is the next free location.
In its search the free pointer will skip over any marked cells.
The same relocation operation occurs: the contents of %b204%* is moved
to location %b155%*, and the forwarding address of %b155%* is placed in location
%b204%*. The process continues until the two pointers collide.
Call that collision point %2col%*.
When they meet, all locations above %2col%* either will be free or will
contain forwarding addresses. All addresses, %2col%* and below, will contain
marked words or relocated cells.
We are now ready to enter the %2update%* phase.
.BEGIN GROUP;
Here is the picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.krk
.boxa
. . .
⊂αααπααα⊃
77 ~ 65~402~
%ααα∀ααα$
. . .
⊂αααπααα⊃
100 ~204~402~
%ααα∀ααα$
. . .
⊂αααπααα⊃
155 ~402~ 77~
%ααα∀ααα$
. . . ← ∞2col∞*
⊂αααπααα⊃
204 ~ ~155~
%ααα∀ααα$
. . .
⊂αααπααα⊃
402 ~ ~100~
%ααα∀ααα$
. . .
.boxb
.END
.END
.FP
We examine the initial segment of our space from the bottom to %2col%*
looking for any references to that area %6above%* %2col%*. A reference
to that area must be changed. What is found in the referenced cell is not
the desired information, but is the forwarding address of the desired information.
What to do is obvious: tell the sender what the change of address is.
Thus the %3cdr%*-part of cell %b77%* becomes %b100%*; the %3car%*-part doesn't
change. Cell %b100%* refers to two relocated cells; we find their forwarding
addresses, and cell %b100%* becomes
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
⊂αααπααα⊃
100 ~155~100~
%ααα∀ααα$
.BOXB
.END
.FP
Similar treatment is given to cell %b155%*, modifying the %3car%*-part.
When all cells below %2col%* are updated, the garbage collection is finished.
The cells above %2col%* are all available for the free-list.
.GROUP;
.CENT(Problems)
.NL
1.##Is %2col%* in the free space list after the update phase?
.NL
2.##Write a LISP algorithm for compacting garbage collection in LISP.
.APART;
.SS(Bit-tables,,P263:)
.FP
In the marking phase of a garbage collector it is
necessary to record the visitation of each word.
Frequently it is not possible to place a mark in the actual word.
This might occur for several reasons:
.SKIP 1
.NL;
%21.%1##For a word in FS, there is no
room if each word contains exactly two addresses.
.NL;
%22.%1# For a word
in FWS, the information would be changed if we modified a bit.
.NL;
%23.%1# In structures, more complex than dotted pairs, there may not be room for
marking bits.
.NL;
%24.%1# If a mark bit is assigned in each word, then the initialize phase
requires that we visit each word. This violates
"locality of reference".⊗↓Locality refers to the relative distance between
memory locations assigned in a particular structure. In some machine organizations,
memory is divided into "pages" of a relatively small size. There is significant
overhead involved in crossing page boundaries. Therefore memory referencing
which entails many scattered references is said to violate "locality of
reference."←
.pt24;
.FP
An alternative solution designates a separate section of memory called a
%2bit-table%1. The bit-table is a sequence of binary flags such that there
is a one-to-one correspondence between a flag and a markable memory location.
Whenever we wish to record the visiting of a word, we set the corresponding flag
in the bit table.
A bit table is represented as a sequence of machine locations with several
flags represented in each word.
The initialization phase is improved since it is faster to initialize a whole
table rather than initialize single bits in separate words.
The mark phase is rapid if there
is a simple calculation to relate each
bit in a word with its corresponding markable location.
.SS(Representations of Complex Data Structures,,P138:)
.FP
In our discussion of abstract context-free data structures in {yonss(P287)},
we isolated three kinds of structures:
.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 ... %eD%41%1
e.g.,←<seq> ::= %2(%*<seq elem>%2,%* ..., <seq elem>%2)%*
.END
.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 | %eD%42%1 | %eD%43%1
e.g.,←<seq elem> ::= <indiv> | <seq>
.END
.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 %eD%42%1 %eD%43%1 ... %eD%4n%1
e.g.,←<sexpr> ::= %3(%*<sexpr> . <sexpr>%3)%*
.END
.PT18;
.FP
We have discussed the behaviorial characteristics of algorithms
which operate on these structures. Now we wish to examine
the storage structure aspects of these data structures.
Corresponding to these three
data structures are three "natural" storage representations.
By "natural" we mean that even though all these structures
can be represented as LISP S-expressions, for example, there
are representations which might better suit the operations
which we expect to perform on those structures.
Since "natural" is not a well-defined term, we will
clarify its meaning using examples of context-free data structures.
The first type of data structure given above, maps naturally onto
a representation which contains information that the object is of
type %eD%1 and contains space for the storage instance of this data type.
Elements of type %eD%1 are homogeneous, being all of type %eD%41%1;
however, the size of a type %eD%1 element is indefinite.
Depending on the operations which are to be performed on the representation,
either a list representation or an array representation is reasonable
for the storage structure. Unless the operations are quite
complex, a sequential allocation scheme suffices.
The second type of data structure is frequently represented as a pointer.
There really isn't any storage allocated for objects of this type.
Instances which satisfy this equation have their storage
requirements set by one of the %eD%4i%1 alternatives.
We will discuss pointer manipulation in LISP in the next section.
This section will discuss the third abstract data structure.
The essential characteristic here is that instances of this structure
have a fixed number of components, and
those components need not be of homogeneous type.
Those components are typically referenced by name.
These characteristics form a natural distinction between
this third class and the first class, even though an appropriate encoding
would make it possible to represent either class in the other.
.GROUP;
For example, in equations like
.BEGIN CENTERIT;
.PT18;
←<sexpr> ::= %3(%1<sexpr> . <sexpr>%3)%1
.ONCE INDENT 0;
or ←<form> ::= <function>[<arg-list>]
.END
.PT18;
.FP
we reference components by selectors like %3car%1, %3cdr%1, %3func%1,
and %3arglist%1.
.APART;
LISP represents instances of the above equations
as objects of the first and second types of data structure:
variable-length lists of pointers.
As a result, we have thought of these selectors as operations
which might require some nontrivial amount of computation
to discover the desired component, but as we saw in
{yonss(P290)} what is algorithm and what is data depends on your point of
view. For example, we could think of a dotted pair as an array which has
two components, one referenced by %3car%1, one referenced by %3cdr%1.
We say "array," since the number of components is known; but the
element references are done by nonnumerical names.
The natural storage requirements for such objects imply a fixed amount
of storage. That storage can be sequentially allocated since the
size of the element will not vary. The representation must also
encode the scheme for associating external selector with internal
representation.
.GROUP
For example,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK;
.BOXA
⊂αααααπααααα⊃
~ CAR ~ ~
εαααααβαααααλ
~ CDR ~ ~
%ααααα∀ααααα$
.BOXB
.END
.APART
.FP
Notice that the array-referencing mechanisms have to solve a similar
problem. However, array representation is such that the dope vector
can perform a %6calculation%1 to locate the element.
The storage element which we are developing is called
a %2⊗→record↔←%1 (⊗↑[Pop#68]↑), or a %2structure%1 (⊗↑[Alg#75]↑,
⊗↑[EL1#74]↑), or a %2plex%1 (⊗↑[Han#69]↑).⊗↓A similar device, called a
%2hunk%1, has been implemented
in MacLISP ⊗↑[Ste#pc]↑.←
.BEGIN TURN OFF "←";
Besides the usual constructors, selectors and recognizers, records
may be supplied with a function
to modify components of a structure. This function is called an
%2updater%1.
Just as we can write %3A[43]#←#56%1 where %3A%1 is an array, an
updater function would implement a statement like %3car[x]#←#(A#.#B)%1.
.END
Updating of simple variables is called assignment.
A discussion of "updating" of more general
data structures requires a deeper understanding of
the implementation and storage structures. In the case of
LISP, it requires a discussion of
pointers. That is the topic of the next
section.
.SS(%3rplaca%* and %3rplacd%*,,P171:)
.BEGIN TURN ON "←";
.FP
The discussion in {yonsec(P188)} developed an implementation of the
LISP operations in terms of the manipulation of pointers.
Those manipulations allowed the
creation of new structure or allowed
sharing of an existing structure. None of these operations involved the
modification of an existing structure.
In this section we will discuss
some LISP coding
tricks which do involve modification operations.
.BEGIN GROUP;
First, consider
.boxa
%3←append <= λ[[x;y][null[x] → y; %et%3 → concat[first[x];append[rest[x];y]]]]%1
.boxb
.END
.FP
This function copies %3x%* onto the front of %3y%*.⊗↓Note: %3y%* is not copied.←
Or recall the %3subst%* function: it generates a copy with the correct
substitutions made; the substitutions are not made in the original S-expr.
Since it is the constructors which carry out the copying operations, and since
the application of a constructor may initiate the expensive operation
of garbage collection, we should examine the possible ways of reducing
copying.
Consider the expression %3append[(A B C);(D E)]%*. It appears that we
could get the effect of %3append%* by %3rest%*-ing down the list
%3(A B C)%* until we found the terminator; then we could replace
that terminator with a pointer to the list %3(D E)%*. Thus,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
~ A ~ #αβα→~ B ~ #αβα→~ C ~≤'. . .→~ D ~ #αβα→~ E ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
.FP
The resulting structure does indeed look like one we would have
obtained from %3append%1. The operation we want to perform
%6modifies%1 the existing structure, instead of %6copying%1 it as %3append%1
would have done. Such modifications can cause serious difficulties.
.GROUP;
Let us call the modifying version of %3append%1, %3nconc%1; and consider the
execution of the following sequence of statements:
.BEGIN TABIT1(30);TURN OFF"←";SELECT 3;
%1first%3\i ← (A B C)
%1then%3 \j ← (D E)
%1and finally,%3\k ← nconc[i;j]
.pt18
.END
.APART
.FP
After execution of the third statement, %3k%1 would have
the expected value %3(A#B#C#D#E)%1.
However %3i%1 would %6also%1 have this value since we
modified the structure assigned to %3i%1.
Also, any value
which was sharing part of the structure of %3i%* will also be changed.
%3nconc%1 is a pointer modification procedure; its effect can be quite far-reaching
and unexpected. Exclusion of such features is one solution. However a programming
language %6is%1 a tool, and used carefully, such features are valuable
for decreasing storage requirements and execution time.
Inclusion of such features must be done with care, however. The chance for
inadvertent application must be minimized.
With the preceding adminitions, we introduce the LISP pointer modification
primitives. Their appearance at this position in this text
indicates that such operations
are not critical to an understanding of programming languages, and also that
such features should not be used without a reasonable understanding of that
language.
Pointer modification functions for LISP are defined in terms of two primitive
operations; %3rplaca%1 replaces the %3car%1 pointer; %3rplacd%1 replaces
the %3cdr%1 pointer.
.BEGIN GROUP;
The expression ⊗→%3rplaca%*↔←%3[x;y]%* replaces the %3car%*-part of %3x%* with %3y%*.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
∞3dest∞* ∞3env∞*
~ ~
~ ⊂ααααααα⊃ ~ ⊂ααααααα⊃
%→ ~ #αβα⊃ %→ ~ #αβαα###→
εαααπαααλ ↓ εαααπαααλ
# # # ←$ ~ x ~ # βα→α⊃
~ ~ β→ - - →⊃ εαααβαααλ ~
εαααβαααλ ~ y ~ #αβ→⊃ ↓
~ ~ ~ ↓ %ααα∀ααα$ ↓ ~
# # # ⊂αααπααα⊃ ⊂ααααααα⊃ ~ ↓
εαααβαααλ ~ # ~ ? ~ ⊂ - - →~ ? ~←$ ~
~ ~ ~ ⊂→%α|α∀ααα$ | %ααααααα$ ~
%ααα∀ααα$ ↑ % - → - →$ ↓
%←ααα←ααααα←αααα←ααααα←ααααα←ααα$
.BOXB
.END
.LE(6,Algorithm for %3rplaca%1)
.END
.FP
The AMBIT/G description of %3rplacd%1 was given on {yon(P246)}.
.BEGIN GROUP
.fp
Now %3nconc%* can be defined as:⊗↓Since we're really involved with the representation
we use %3car%* and %3cdr%* rather than %3first%* and %3rest%*.←
.BEGIN SELECT 3;TABIT2(16,19);TURN OFF"←";
.KRK
.BOXA
nconc <= λ[[x;y] prog[[z]
\\[null[x] → return[y]];
\\z ← x;
\a\[null[cdr[z]] → rplacd[z;y]; return [x]];
\\z ←cdr [z];
\\go[a] ]]
.BOXB
.END
.END
.BEGIN SELECT 3;TABIT2(18,28);TURN OFF"←";GROUP;
.KRK
.BOXA
%1Consider:%*\prog[[x]\x ← (NOTHING CAN GO WRONG);
\\rplacd[cdddr[x];cddr[x]];
\\print[x]]
.BOXB
.END
.FP
This expression will generate a circular list.
Circular lists cannot be generated in LISP without functions like
%3rplaca%* and %3rplacd%*.
See the problem on {yon(P55)}.
In general, to circularize a nonempty
list, %3x%*, %3rplacd[last[x];x]%* suffices where
←%3last <=λ[[x][null[cdr[x]] → x; %et%3 → last[cdr[x]]]]%1
.BEGIN GROUP
←%2Problems%*
.NL
1.##What is the effect of evaluating %3rplacd[x;cdr[x]]%*?
.NL
2.##Recall the problem on ⊗→hash consing↔← on {yon(P154)}.
There we were contemplating unique storage for all S-exprs.
Can such a scheme be reconciled (efficiently) with functions like
%3rplaca%* and %3rplacd%*?
.NL
3.##It has been pointed out that %3rplaca%1 and %3rplacd%1 are
closely related to assignment statements ⊗↑[And#76]↑.
Extend one of our evaluators to recognize expressions like:
.BEGIN CENTER;TURN OFF "←";
.pt2;pt2
%3car[%1<form>%3] ← %1<form>
.END
.ONCE INDENT 4;
as abbreviations for:
.BEGIN CENTER;SELECT 3;
rplaca[%1<form>; <form>%3]%1
.END
.ONCE INDENT 4,4;
This extension of assignment is obviously not restricted to
%3rplaca%1 but could allow arbitrary forms on the left-hand
side of an assignment.
.END
.END
.SS(Applications of ¬3rplaca¬* and ¬3rplacd¬*,,P155:)
.BEGIN TURN ON "←";
We begin with rather simple examples. Consider the problem of inserting
an element into the middle of a list. For example let %3x%* be the list
%3(A B C)%*. If we wished to insert an atom, say %3D%*, between %3B%*
and %3C%*, we could perform
.pt18
.BEGIN CENTER; SELECT 3;turn off"←";
x ← cons[car[x];cons[cadr[x];cons[D;cddr[x]]]]
.END
.PT18
.FP
We recopy the initial segment
of %3x%*, adding %3D%* at the appropriate place.
In appropriate circumstances, we can use %3rplacd%1 to insert elements
into lists, using fewer %3cons%1 operations.
For example, given the list %3(A B C)%* with pointers %3x%* and %3y%* into it
as follows:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 12,12
.BOXA
x y
~ ~
↓ ⊂αααπααα⊃ ↓ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβαα→~ C ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
.group
.fp
we could insert the element %3D%* %6after%* the first element in %3y%* by
###%3rplacd[y;cons[D;cdr[y]]]%*,##giving:⊗↓Notice
that %6one%* application of %3cons%* is unavoidable.←
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 8,8;
.BOXA
x y
~ ~
↓ ⊂αααπααα⊃ ↓ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβα⊃ ⊂→~ C ~≤'~
%ααα∀ααα$ %ααα∀ααα$ ↓ ↑ %ααα∀αα$
↓ ~
⊂αααπααα⊃ ↑
~ D ~ #αβα$
%ααα∀ααα$
.BOXB
.END
.FP
But note that the value of %3x%* has also been changed, and
any S-expr sharing the list %3x%* or %3y%* as a sublist has also been affected.
.APART
.APART
.APART
We could delete the element %3D%1 by
.begin center;turn off"←";
.boxa
%3x ← cons[car[x]; cons[car[y];cddr[y]]]%1
.boxb
.end
We can also use %3rplacd%* to delete %3D%1 without using %3cons%1;
we delete not the %6first%1 element of %3y%1, but the next element
in %3y%* by
.EQ
%3rplacd[y;cddr[y]]%*
.PT18;
.FP
Similarly, we can use %3rplaca%* to modify an element in a list (or S-expr).
To change the first element in the list, %3y%*, to the S-expr %3z%* use
.EQ
%3rplaca[y;z]%*
.PT18;
.FP
Notice that the uses of %3rplacd%* for insertion and deletion are couched in terms
of insert %6after%* and delete %6after%*, rather than insert %6at%* or delete %6at%*.
If you look at a diagram you will see why.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 12,12;
.BOXA
x
~
↓ ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
.fp
To delete the element %3B%* requires modifying the %3cdr%*-part of the predecessor
cell; a similar remark applies to insertion at a specified cell.
A simple, perhaps inefficient scheme, to support such modification
would be to start a second pointer
from the beginning of the list, looking for the cell whose %3cdr%* pointed to the
desired spot; then make the modification.
If these "modification-%6at%*" functions were to be performed very frequently,
then it might be worth starting %6two%* pointers down the list, one at %3x%*,
one, say %3y%*, at %3cdr[x]%*, as above. Then testing could be done using %3y%*
and the modification could be done using %3x%*. When we move %3y%* to
%3cdr[y]%*, we move %3x%* to %3cdr[x]%*. If we wanted to modify
%6before%* rather than %6at%*, we could proliferate the "back pointers," but
if this kind of generality is required a change of representation
is called for. We might resort to the double-linking scheme introduced on
{yon(P164)}; more complex representations are also discussed
in detail in ⊗↑[Knu#68]↑ Chapter 2.
A LISP
implementation which stores p-lists as list structure would
use %3rplaca%* and %3rplacd%* heavily; for example, functions
which modify properties on the p-lists would use these functions. Here are the
two p-list manipulating functions, ⊗→%3putprop%*↔← and ⊗→%3remprop%*↔←.
.GROUP;
.DEF
%3putprop%* was introduced on {yon(P52)}.
Recall that the effect of %3putprop%* is to attach an indicator-value pair
to an atom. If the indicator is already present, then we will simply change its
value;
if the indicator is not present, then we will add the indicator-value
pair to the front of the p-list.
In the definition %3n%* is an atom, %3i%* is an indicator, and %3v%* is
the value to be stored.
.BEGIN SELECT 3;TABIT2(17,20);GROUP; TURN OFF"←";
.KRK
.BOXA
putprop <= λ[[n;v;i] prog[[m]
\\m ← cdr[n];
\a\[eq[car[m];i] → rplaca[cdr[m];v];return[v]];
\\m ← cddr[m];
\\[null[m] → rplacd[n;cons[i;cons[v;cdr[n]]]];return[v]];
\\go[a] ]]
.BOXB
.END
.FP
Note that extended conditional expressions are used in the definition.
.APART
.GROUP;
.DEF
%3remprop%* was also introduced on {yon(P52)}.
%3remprop%* is a predicate used to remove attribute-value pairs from
the property list of an atom.
We will capitalize on the LISP "%3NIL%1-non#%3NIL%1" trick
for predicates and return the removed property value if one is found.
The following implementation of %3remprop%1 does that.
.BEGIN SELECT 3;TABIT3(15,18,50);GROUP;TURN OFF"←";
.KRK
.BOXA
remprop <= λ[[n;i] prog[[m]
\\m ← n;
\a\[eq[cadr[m];i] → return[prog1[\caddr[m];
\\\rplacd[m;cdddr[m]]];
\\m ← cddr[m];
\\[null[cdr[m]] → return[%ef%3]]
\\go[a]]]
.BOXB
.END
.APART
.FP
where %3prog1%1 evaluates its arguments from left to right and
returns the value of its first argument.
Applications of %3rplacd%* occur inside ⊗→%3ratom%*↔←
when p-lists are built and added to the object list.
On {yon(P160)}
we will develop a version of LISP's parser which uses pointer modification to
gain efficiency when building the internal representation.
Pointer modification is also used
inside the ⊗→garbage collector↔←; examine the %3sweep%1 phase of the collector
on {yon(P257)}.
.P161:
Finally, pointer modification
allows the construction of self-modifying programs.
This technique is similar to the machine language tricks of self-modifying code
and should be used with similar care.
The freedom to hang yourself should not be construed as an invitation
to do so, but
it again points out the similarities of LISP to machine language and
highlights the differences between LISP and its contemporary high-level
languages.
LISP's central processor %3eval%* operates by traversing and interpreting
the data structure representation of the program; that data structure is also
open for inspection by LISP's data structure manipulating functions.
Since we now have list-modifying functions, we could modify a program
by changing its internal structure. Indeed we can write a program which
modifies its %6own%* structure.
.BEGIN TABIT2(14,16);GROUP;TURN OFF "←";
Here's one:
.SELECT 3;
.KRK;BOXA
foo <= λ[[x] prog[[y;z]
\\z←1;
\\y←sixth[body[foo]];
\a\print[x];
\\rplaca[rest[y];z←add1[z]];
\\go[a] ]]
.BOXB
.END
.FP
The mystery created by %3y%* is a pointer into the representation of the
statement %3print[x]%*; that representation is %3(PRINT#X)%*. Therefore
the effect of the first %3rplaca%* is to change %3(PRINT#X)%* to %3(PRINT#2)%*.
Subsequent passes through the loop will change the statement to print
%33, 4,%* and so on.
There really isn't much that can be said about such a program.
.CENT(Problems)
.NL
1.##More on %3ratom%*.
Recall the discussion of %3ratom%* in {yonss(P13)} and {yonss(P14)}.
Now that you know about %3rplaca%* and %3rplacd%* write a more detailed
version of %3ratom%*.
.NL
2.##On {yon(P16)} and {yon(P23)}
we wrote various styles of %3reverse%1.
All these functions used %3concat%*; however, we should be
able to reverse a list without using %2any%* new cells.
Express this
algorithm as a LISP function. If you use %3prog%*, don't use any %3prog%*-variables.
.END
.<<NUMBERS>>
.NEXT PAGE
.SS(Numbers,numbers,P293:)
.GROUP;
.FP
In many implementations of LISP, numbers are stored as very simple kinds of
atoms: they
are not stored uniquely, and do not need print names. Most implementations
allow fixed- and floating-point representation; therefore, indicators
for these properties are needed. Thus:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.INDENT 8,8;
fixed-point 1
~
↓
~ ⊂ααααααααπααα⊃ ⊂ααααα⊃
%→ ~ FIXNUM ~ #αβαα→~ 1 ~
%αααααααα∀ααα$ %ααααα$
floating-point 1
~
↓
~ ⊂ααααααααπααα⊃ ⊂ααααααααααααααααααααααα⊃
%→ ~ FLONUM ~ #αβαα→~ <machine rep of 1.0> ~
%αααααααα∀ααα$ %ααααααααααααααααααααααα$
.BOXB
.END
.APART;
.FP
The number is stored in FWS and the
type is indicated by a minimal property list.
This representation
is expensive in space and adds significant overhead to the execution of
arithmetic
operators. Several techniques have been used to improve LISP arithmetic.
Assume that the addressing
space of the machine is 2%818%* and that the usual size of a LISP
memory image is N; within the LISP system, all
references to memory locations greater than N are illegal.
We will use these illegal addresses to encode some of the
smaller positive and negative integers, mapping zero on the middle
address, the positive numbers to lower addresses and the negatives
onto the higher addresses. These smaller integers, called
%2INUMS%*, are
represented by pointers outside of the normal LISP addressing space.
This trick can considerably decrease the storage requirements for
applications
which use small numbers extensively.
.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 32;
.BOXA
⊂αααααααααα⊃ 2∞818∞*
~?~
~?~
~?~ m ∞1<∞* 0
~?~
~?~ m = 0
~?~
~?~ m ∞1>∞* 0
~?~
~?~
εααααααααααλ N
~?~
~?~
~?~
~?~
~?~
~?~
~?~
~?~
%αααααααααα$ 0
.BOXB
.END
.LE(6,Picture of INUM Space)
.APART
.FP
The INUM representation adds some complexity since
the arithmetic operators now have to recognize these illegal pointers as
encoded numbers.
The MACLISP (⊗↑[Moo#74]↑) implementation uses a different representation for
numbers.⊗↓Much care went into the the representation of numbers
in MACLISP. That LISP system
is used as the implementation language for MACSYMA
(⊗↑[MAC#74]↑, ⊗↑[Wan#75]↑, ⊗↑[Mos#74]↑),
a large algebraic and symbolic mathematics system. MACLISP's efficient
number facilities, coupled with its optimizing compiler,
have resulted in the production of compiled code which is more efficient
than that produced by DEC's FORTRAN compiler (⊗↑[Fat#73]↑).←
In that implementation, two spaces are allocated for number storage:
FIXNUM space and FLONUM space. This makes a more compact representation since
the type information is implied in the address of the object rather than
being explicitly
stored. To those basic spaces we add two temporary stack areas: FIXPDL and
FLOPDL. These areas are used for temporary arithmetic computation.
The temporary areas work in conjunction with a type declaration option
used to aid the MACLISP compiler.
If we know that certain variables are %6always%1
going to be used as numbers in a particular function, then we can compile
better code. Assume %3x%1 and %3y%1 are to be used %6only%1 as FIXNUMs
within a function %3f%1; we would make such declarations for the
MACLISP compiler
just as we can declare some variables as "special" to other
LISP compilers.
When we allocate space for %3x%1 and %3y%1, we allocate space on the
top of FIXPDL. Within %3f%1 the arithmetic operations use the hardware
arithmetic and reference the stack elements. The stack elements can be
passed to other arithmetic functions called within %3f%1, and no permanent
storage need be allocated in FIXNUM space until later.
The efficiency of arithmetic operations is dependent on the existence
of special hardware instructions for such arithmetic.
However, special hardware also places limits on the arithmetic
capabilities of most languages: arithmetic
is usually limited by the word size
of the machine.
There are
several versions of LISP which will automatically change
representation when faced with overflow.
This scheme is called %2arbitrary precision arithmetic%1 and
has been implemented for both fixed-point and floating-point
numbers. We will describe a representation for fixed-point numbers
called %2BIGNUMS%1; they could have the following structure:
.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.INDENT 10,10;
positive big number
~
↓
~ ⊂ααααααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
%→ ~ POSNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
%αααααααα∀ααα$ %αβα∀ααα$ %αβα∀αα$
↓ ↓
N∞40∞* N∞4n∞*
negative big number
~
↓
~ ⊂ααααααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
%→ ~ NEGNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
%αααααααα∀ααα$ %αβα∀ααα$ %αβα∀αα$
↓ ↓
N∞40∞* N∞4n∞*
.BOXB
.END
.LE(8,Structure of a BIGNUM);
.APART
.BEGIN GROUP;
.FP
The value of a BIGNUM is given by:
.EQ
%7b%1(N%40%1 + %7a%1N%41%*+ ... + %7a%8n%1N%4n%1)
.PT18;
.FP
where %7b%1 is either + or - and
%7a%1-1 is the largest number representable in one machine word.
The translations between BIGNUMS and the other numeric representations
are done automatically.
On most implementations of LISP, no attempt is made to store numbers uniquely.
Thus %3eq%1 will not work on numbers other than INUMs; either
%3equal%1 is extended for numbers
or a special equality predicate for numbers is provided.
.END
.SS(Stacks and Threading)
.FP
Though recursive algorithms are usually more illuminating than
their machine-oriented counterparts, it is frequently more efficient
to encode those algorithms in manners which can take advantage of the
hardware. This section will discuss two techniques which "unwind" the recursion
and typically lead to faster execution.
Recall the marking phase of a garbage collector in {yonss(P146)}. There
we wrote %3mark%* as a recursive algorithm. We could equally well write %3mark%*
using an explicit stack:
.BEGIN TABIT3(16,23,43);SELECT 3;GROUP;TURN OFF "←";
.boxa
.krk
mark <= λ[[tr]prog[[st]
\loop\[is_marked[tr] → go[chk_st];
\\ is_full_wd[tr] → markA[tr];go[chk_st];
.PT2
\\ is_free_wd[tr] →\st←push[cdr[tr];st];
\\\markA[tr];
\\\tr←car[tr];go[loop]];
\\ %Et%* → go[chk_st]]; ⊗↓%1This branch of the conditional could be omitted
and the effect would be the same.←%3
\chk_st\[null[st] → return[%et%*]];
\\tr←top[st];
\\st←pop[st];
\\go[loop] ]]
.boxb
.APART;
.group;
push <= λ[[i;st] concat[i;st]]
.PT2
top <= λ[[st] first[st]]
.PT2
pop <= λ[[st] rest[st]]
.boxb
.END
.FP
Notice that we save only the %3cdr%*-node in the stack; even at that, the
stack grows proportionally to the depth of the tree being traversed.
See the problem on {yon(P162)}.
The technique of using an explicit stack sometimes is more intuitive and
sometimes will lead to faster execution.
The second technique is more tricky but will lead to significant pay-offs
in execution time.⊗↓But there will be a proportional loss in clarity
in the code.←
The technique is called %2⊗→threading↔←%*.
The basis for threading is a desire to traverse tree structures in a more
efficient fashion than that typically available
implicitly in recursion, or explicitly via
stacks. Recall that on {yon(P164)} we surmised
that %2⊗→double-linking↔←%* might be
advantageous in moving up and down the "spine" of a tree structure.
Double links would allow us to find the successors and predecessors of nodes
easily. However the extra link gives us no help if we wish
to descend into the substructure. It is this area to which threading addresses
itself: descent into tree structure.
Examination of the new %3mark%* algorithm will reveal that for a
%6fixed%* tree and a %6fixed%* order of traversal;
any two applications of marking will
have the same pattern of behavior. The order of visitation to each node
will be the same, but more importantly,
the dynamic changes in the state of the stack
will %6also%* be the same. Instead of replicating the portion of the stack,
it might be possible to store the stack information in the structure itself.
Threading hides the control structure in the data structure.
Typically,
threading
requires a more complex data structure since we must store both
threads and links. The traversal algorithms also become more complex since
we must recognize the difference between
control threads and data links. Care must also be taken if we wish to share
threaded list structure. See ⊗↑[Knu#68]↑ for a complete discussion
of the techniques and tricks.
We do not wish to complicate the LISP structures, but dispensing with a
stack, be it implicit or explict, does
influence storage requirements. We can
strike a compromise; instead of permanently storing the threads
in the structure,
we can %6temporarily%* store threads as we traverse trees.
The first application is in the design of a nonrecursive %3read%* program.
The second application we will describe is in the mark phase of a garbage
collector.
.CENT(Problem)
.P162:
.NL
1.##With a little
more testing before stacking we can significantly cut down the
number of %3push%*es we have to do. Namely, if some of the branches point
immediately to atoms we might as well mark them at that time and
proceed without doing a stack operation. Only when both branches are "nonatomic"
do we need stack the %3cdr%*. Write such an algorithm.
Further, is it better to stack the %3cdr%1 nodes or the %3cdr%1 nodes?
.SS(A Non-recursive %3read%*)
.P160:
.FP
The original %3read%* algorithm of {yonss(P13)} is a good example of
a clear recursive algorithm;
it is reasonably straightforward to follow the flow of the algorithm.
However, now that we understand what the run-time behavior of such a recursive
program is, we see that %3read%* is a drain on %6two%*
resources: it uses
free-space to construct the internal representation of the input;
it uses the run-time stack in the implementation of the
recursion and for saving parameters and intermediate computations. A deeply
nested expression will use a lot of the run-time stack.
Clearly, there is nothing we can do about the drain on the free lists,⊗↓We
probably will be drawing on the full word area for print name
storage as well as on the free space area for
list structure storage.← but threading %6can%* dispense with the run-time
stack.
We can in fact do so without a proportional increase in the use of
free space; indeed we need only %6one%* additional free cell, regardless
of the complexity of the input! The algorithm will be much more complex that
the recursive parser, but that's why this section on storage and efficiency
is where it is. We now understand the purpose and intent of %3read%*.
Now that the basic algorithm is well understood we can be
clever and efficient.
First we describe the basic ideas of the algorithm, then we give the algorithm.
The main idea in the algorithm is the realization that we can determine
the storage requirements for a complex S-expr or list structure as we read it in.
For example, consider
the input string "%3(A#(B#C)#D)%*". As we start our left-to-right scan of the
input,
we see "%3(%*". This immediately tells us that we need at least one %3cons%*.
We read "%3A%*"; that tells us what the %3car%* of the expression is. Notice that
we don't yet know whether the expression is "dotted" or "listed," but the
storage requirements will be the same. On reading the next open parenthesis we
know we need to add a new level in the developing representation. The
"%3B%*" and "%3C%*" add elements to that level, and the closing parenthesis finishes
it off. The closing parenthesis also should signal our parser to return to the
prior level and continue scanning the input. The "%3D%*" goes on that
level and the final closing parenthesis completes the input.
To implement this informal idea, we keep a thread in the %3cdr%*-part of
the last cell on every level. When we go down a level we manufacture a new
cell with the %3cdr%* pointing to the cell we just came from in the previous
level; this happens when we see a left parenthesis. We go up a level when we see
a right parenthesis; that is done by following up the thread in the current level,
after doing appropriate cleanup.
.group;
There are three basic states in the reader:
.NL
%21.%*##The next input should go into the %3car%*-part of the current cell.
This state is entered when we go down a level. It is labeled %3head%* in the
following program.
.NL
%22.%*##The next input should go on the current level. This is the typical state
in the building of a list-input. Here we add a new cell in the current level
and put the input in the %3car%*-part of that cell; then stay in this state.
This state corresponds to label %3tail%*.
.NL
%23.%*##The other main state occurs on reading a dot when in %3tail%*#state.⊗↓Dots
seen in any other context are errors.← In dot#state we check the next input;
if it is an atom we store it on the thread and
follow the thread. If the input is a left parenthesis
we add a new cell and go down.
.PT24
.APART
There are some anomalies in the algorithm since it must
recognize both S-expr notation
and list notation. To handle both
kinds of input, we add a parenthesis counter; it increments for
left parentheses and decrements for right parentheses. A legal input has been
recognized when we are back at the top level and the count is zero.
The final difference between the old parser and the new one involves
the scanner %3ratom%*. We assume a new %3ratom%* which reads %3()%* and
returns %3NIL%*. If the scanner
sees an open parenthesis, it looks ahead to the next meaningful
character.⊗↓It ignores
spaces and the like.← If the character is a closing parenthesis,
the scanner takes it;
if the character is not, it is left for the next call on %3ratom%* and
%3ratom%* returns with an indication that it has seen a left parenthesis.
.BEGIN TABS 12,21,35,47,49;TURN OFF "←";GROUP;
.TURN ON "\";NOFILL;
.KRK
.BOXA
With this introduction, here is the new %3read%*:
.boxa
.select 3;
read <= λ[[] prog[[j;cp;count;top;temp]
\\count←init[]; cp←count; top←cp;
\head\j←ratom[];
\\[or[is_dot[j];is_rpar[j]] → err[];
.PT2
\\ is_lpar[j] →\incr[count];
\\\cp←down[cp];
\\\go[head];
\\ atom[j] → stuff[cp;j]; go[ckend]];
\tail\j←ratom[];
\\[atom[j] → cp←insert_move[cp;j]; go[ckend];
.PT2
\\ is_rpar[j] →\decr[count];
\\\[eq[top;cp] → go[ck1];
\\\ %et%* → cp←stuff_up[cp;NIL]; go[ckend]];
\
\\is_lpar[j] →\incr[count];
\\\cp←down[insert_move[cp;NIL]];
\\\go[head];
.PT2
\\is_dot[j] →\j←ratom[];
\\\[or[is_dot[j];is_rpar[j]] → err[];
\\\ is_lpar[j] →\incr[count];
\\\\\cp←insert_move[cp;NIL];
\\\\\go[head];
.PT2
\\\ atom[j] →\cp←stuff_up[cp;j];
\\\\go[ckend]]]; ⊗↓%1This %3go%1 is superfluous, but makes the flow more apparent.←%3
\ckend\[eq[cp;top] → go[ck1];
\\ %et%* → go[tail]];
\ck1\temp← cnt[top];
\end2\[zerop[temp] → return[exp[top]];
\\j←ratom[];
\\[is_rpar[j] → temp←sub1[temp]; go[end2];
\\ %et%* → err[] ]]]
.BOXB
.END
.BEGIN SELECT 3;GROUP;TURN OFF "←";TABIT1(29);
.KRK
.BOXA
init <= λ[[] cons[NIL;0]]
stuff <= λ[[x;y] rplaca[x;y]]
incr <= λ[[z] rplacd[z;add1[cdr[z]]]]
insert_move <= λ[[cp;val] rplacd[cp;cons[val;cdr[cp]]]; cdr[cp]]
down <= λ[[cp] rplaca[cp;cons[NIL;cp]];car[cp]]
stuff_up <= λ[[cp;j] prog[[temp]
\temp ← cdr[cp];
\rplacd[cp;j];
\return[temp]]]
cnt <= λ[[x] cdr[x]]
.PT2
exp <= λ[[x] car[x]]
.BOXB
.END
.FP
The development and understanding of this algorithm requires most of what we
have covered in the course.
We use our knowledge of the parser, %3read%*; we use our familiarity with
S-exprs stored as linked lists; we have to understand the run-time control
of recursive calling sequences; we have to understand pointer manipulation;
we have to understand pointer modification; and finally we have to be wickedly
clever.
With that understanding we
were able to apply threading at a level higher than a "once#only" trick.
.CENT(Problem)
.NL
%21.%*##Write a version of %3read%* which uses an explicit stack to remember where it
is in the parse.
.SS(More Applications of Threading)
.Cent(A Link-Bending Garbage Collector for LISP)
.P144:
.FP
The use of a stack is one of the difficulties associated with garbage
collection. Garbage collection is invoked when available space has become
exhausted, but here we are asking for %6more%* space to use for stacking.
The usual solution
to such a problem is to allocate a separate area for stack storage. This
has its drawbacks. If we don't allocate enough stack space the
depth of a piece of structure may become too great, and the marker will fail.
The amount of stack space can become large - proportional to the
depth of a list.
We can apply threading here,
modifying the structure as we traverse it; as usual the threads will be used as
control information. As we finish marking a branch we
restore the structure to its original topology.
Several versions of such threaded collectors are available; see
⊗↑[Chr#68]↑ for a version
written in AMBIT/G; a more traditional description is found in
⊗↑[Sch#67]↑;⊗↓The correctness of [Sch#67] has been proved by de#Roever.←
and see ⊗↑[Knu#68]↑ for several alternatives.
.Cent(Binding Implementations)
.FP
Threading can be used in the shallow binder described in {yonss(P228)} to remember
the path through the environment tree#(⊗↑[Urm#76]↑).
We thread from E%4bind%1 to E%4inter%1
when we are looking for E%4inter%1.
This consists of reversing the access links as we proceed toward E%4inter%1.
Then, as we swap back the value cells, we will unthread from
E%4inter%1 to E%4bind%1.
.SS(Storage Management and LISP,,P224:)
.FP
There are two basic areas of LISP which require attention:
the implementation of data stuctures, and the implementation of a
LISP machine. We will discuss applications in that order.
LISP's typical data object is a dotted pair; however,
dotted pairs are frequently used to represent more structured objects.
For example, many common LISP programs involve list#operations on
list representations. But lists, we know, are representations of sequences.
From {yonss(P137)}
we now also know that arrays are efficient representations of sequences.
Indeed array representations are typically more efficient than the
general LISP linked-list. We would like to capitalize on this more
efficient representation without jeopardizing the LISP operations.
An analysis of the LISP operations shows that we %6share%*
substructures, and if using %3rplaca%* or %3rplacd%*, we
%6modify%* existing structures. Any proposed economies in storage layout
must take cognizance of these facts. Fortunately these requirements are compatible.
.GROUP;
.FP
Consider the typical representation of the sequence:
.BEGIN CENTER;SELECT 3;
.BOXA
(LAMBDA (X) (F X Y))
.END
.PT2
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
⊂ααααααααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
~ LAMBDA ~ #αβαα→~ # ~ #αβα→~ # ~≤'~
%αααααααα∀ααα$ %αβα∀ααα$ %αβα∀αα$
~ ↓
⊂αααααααααα$ ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
↓ ~ F ~ #αβ→~ X ~ #αβ→~ Y ~≤'~
⊂αααπαα⊃ %ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
~ X ~≤'~
%ααα∀αα$
.BOXB
.END
.APART
.FP
This representation takes seven words.
The %3car%*-part of each cell contains the information;
the %3cdr%*-part tells where the rest of the expression is to be found.
That is, we have dedicated 14 half-words to represent the structure; only seven
of which contain the actual information we wish to store.
Using some extra encoding we can carry the same information in
seven slightly larger cells.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.boxa
.indent 15,15
⊂ααααααααα⊃
~↓ LAMBDA ~
εαααααααααλ ⊂αααααααα⊃
~↓ #αβαααα→~/ X ~
εαααααααααλ %αααααααα$
~/ #αβα⊃
%ααααααααα$ ↓ ⊂αααααααα⊃
%αα→~↓ F ~
εααααααααλ
~↓ X ~
εααααααααλ
~/ Y ~
%αααααααα$
.boxb
.END
.FP
The intent of the special characters is to encode type information about
the %2next%* cell in the representation. It thus is called %2⊗→%3cdr%*-coding↔←%1.
The %b↓%* means the next cell %2is%* the %3cdr%*; %b/%* means the %3cdr%* is
%3NIL%*.
The typical LISP cell is a third variety of %3cdr%*-coding: the code %b→%* says
the next cell contains a pointer to the %3cdr%*. With that, we introduce the final code:
%b*%* means this cell is the %3cdr%*-half of a LISP word.
Thus %3(A#B)%* could be expressed in any of the following forms:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
⊂ααααα⊃ ⊂ααααα⊃
~↓ A ~ ~→ A ~
εαααααλ εαααααλ ⊂ααααα⊃
~/ B ~ ~* #αβααα→~/ B ~
%ααααα$ %ααααα$ %ααααα$
⊂ααααα⊃ ⊂αααααπααααα⊃ ⊂αααααπααααα⊃
~→ A ~ ~ A ~ #αβααα→~ B ~ ≤' ~
εαααααλ ⊂ααααα⊃ %ααααα∀ααααα$ %ααααα∀ααααα$
~* #αβαα→~→ B ~
%ααααα$ εαααααλ
~* NIL~
%ααααα$
.BOXB
.END
However this encoding scheme is not sufficient as it stands. Consider the
following example: Given internal pointers %3x%* and %3z%* into
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 13,13;
.BOXA
x z
~ ~
↓ ⊂αααπααα⊃ ↓ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ F ~ #αβαα∀αα→~ X ~ #αβαα→~ Y ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
.FP
and assume we wish to perform %3rplacd[x;(A#B#C)]%*.
Using
our standard implementation, we would have:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.indent 8,8
.BOXA
x z
~ ~
↓ ⊂αααπααα⊃ ↓ ⊂αααπααα⊃ ⊂αααπαα⊃
%αα→~ F ~ # ~ %αα→~ X ~ #αβαα→~ Y ~≤'~
%ααα∀αβα$ %ααα∀ααα$ %ααα∀αα$
↓
~ ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
%→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
%ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
.BOXB
.END
.FP
However, a problem arises if
%3(F X Y)%* is represented in its compact form.
We can't replace the cell
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 8,8;
.BOXA
⊂ααααα⊃ ⊂αααααα⊃
~↓ X ~ by ~* #αβ→ to ∞3(A B C)∞*
%ααααα$ %αααααα$
.BOXB
.END
.FP
since the value of %3z%* would change. The solution is an application of the
forwarding address scheme we introduced on {yon(P173)} in the compacting
garbage collector.
We put a forwarding address in the cell referenced by %3x%*;
then allocate a new pair of half-cells, putting %3F%* in the first and a pointer to
%3(A#B#C)%* in the second.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
x ⊂ααααααααα⊃ ⊂αααααααα⊃ ⊂αααααααα⊃
%→αα→~i #αβαααα→~↓ F ~ ⊂→αα→~↓ A ~
εαααααααααλ εααααααααλ ↑ εααααααααλ
z →α→~↓ X ~ ~* #αβα$ ~↓ B ~
εαααααααααλ %αααααααα$ εααααααααλ
~/ Y ~ ~/ C ~
%ααααααααα$ %αααααααα$
.BOXB
.END
.FP
These forwarding addresses are an instance of %2⊗→invisible pointers↔←%*
used by R. Greenblatt in his LISP machine; he has also implemented
hardware %3cdr%*-coding.
Between invisible pointers and
%3cdr%*-coding, we %6can%* effect all LISP operations using this
potentially more compact representation.
We must be able to maintain that compact representation while the program
is running. This requires more care in the management of storage.
We cannot simply garbage collect and fragment space; we cannot
use the simple compacting garbage collector
discussed in {yonss(P291)} since it does not
attempt to maintain the compact representation.
Several algorithms
with the desired properties exist (⊗↑[Che#70]↑, ⊗↑[Cla#76]↑).
One feature of this data representation is its use of variable-sized
linked blocks of sequential storage. The management of these storage
blocks is more complex than that of simple dotted#pairs, but the
additional overhead may be acceptable if it gives better locality of reference
and faster access to list elements.⊗↓Notice that the %3cdr%1-coded representations
of %3(A#B)%1 and %3(A#.#B)%1 are equally expensive. In the typical linked-list
representation, %3(A#B)%1 requires more space than %3(A#.#B)%1.←
There is less conflict about the use of more complex storage
management techniques in the area of LISP's dynamic implementation.
The original versions of LISP#1.5 used dotted pair structure to represent
the access environments.⊗↓The control information did use a stack implementation
coded in machine language.← This generality gave a correct solution to the
implementation of %3function%1, but experience with LISP implementations
has shown that it is quite expensive to maintain this generality
when most applications are of a less general nature.
Implementation techniques, patterned after our Weizenbaum
diagrams, allow some economies without loss of generality.
Again, storage would be allocated in sequential blocks; each block would
be of a size sufficient to hold the representation of the name-value entries
along with the additional areas to link the block to the environment.
The storage blocks need not be allocated sequentially; indeed, in the
general case blocks cannot be allocated sequentially. The de-allocation
problems are somewhat different from those experienced
by data structure representations.
The environment structures are much more "well#behaved" than general list-structures.
Therefore an "environment garbage collector" may not be needed.
The most general techniques for management of LISP's dynamic
environment are based on ⊗↑[Bob#73a]↑ and succeeding papers.⊗↓There is something
contradictory about LISP implementors' attitudes toward storage
and dynamics. Much effort is expended in attempting to
minimize the overhead involved in the dynamic operation of LISP;
it is frequently stated that users should not be penalized
for access/control constructs which they do not use. However, that
attitude is not extended to LISP's data structures. There are very
generous subsets of LISP applications in which the data structure operations
are suitably well-behaved, that storage reclamation techniques
less general than garbage collection are applicable. Analysis of this
area of LISP should lead to profitable results.←
At a lower level of implementation, LISP has much to say about machine
organization. The implementation of efficient environment-swapping
algorithms is a problem which any operating system must face.
The traditional solutions impose severe restrictions on interprocess
communications. The algorithms developed for LISP show promise
for giving efficient implementations of more general scope.
LISP's organization of memory also has lessons for machine architecture. The
management of large variable-sized memory spaces like ⊗↑[Ste#73]↑ or ⊗↑[Wegb#70]↑
can be supported in hardware. The allocation and de-allocation of such large
spaces also require care; LISP implementors have begun to address these problems
(⊗↑[Ste#76a]↑, ⊗↑[Bis#74a]↑).
.SS(Hash Techniques,,P248:)
.FP
One phenomenon of LISP is the
sheer size of data structures which a large LISP program generates.
Many LISP projects approach 10%87%1 bits of program and data.
Several techniques have been developed to help shrink data representation;
%3cdr%1-coding ({yonss(P224)}) is one technique. Another technique stems
from the observation that LISP tends to %6copy%1 structures rather
than %6share%1 them. We know that the sharing of structures must be done
with great care if modification operations like %3rplaca%1 and %3rplacd%1
are present, but sharing of structure can mean a significant saving in space.
In fact, the saving can also improve the algorithms which manipulate
the structures. For example if every list#structure is stored uniquely,
then the time for the equality test %3equal%1 is a constant rather than being
proportional to the depth of the structure.
We present two techniques for maintaining unique structure:
either maintain list space such that unique representations are always
present or supply an algorithm which will "uniquize" structures upon
request. The first alternative is usually called %2⊗→hash consing↔←%1; the
second technique is called %2list condensation%1#(⊗↑[Lin#73]↑).
A condensation algorithm must remove all duplicated structure from within
a list. Since condensation is a component of many hashed LISP implementations,
we will concentrate our attention on hash consing.
Hash consing is an extension of the LISP technique for generating
unique atoms. Since
list structure is created only by the %3cons%1 operation,⊗↓However, list structure
may be modified by %3rplaca%1 and %3rplacd%1.←
we place the responsibility for maintaining unique structure within %3cons%1.
If the result of a pending %3cons%1 is already present, we
return a pointer to that structure, otherwise we perform the %3cons%1
and record the result so that it will be retrieved if the same
%3cons%1 happens again. The adjective "hash" is applied to this version of
%3cons%1 since the typical implementation uses a hashing algorithm to
maintain the uniqueness.
Hash %3cons%1ing imposes restrictions on the
programmer's use of list modification operations. If unique copies
are available, severe difficulties result if modifications are made.
One either may disallow list modification or may supply additional operations
to copy structure, modify it, and "uniquize" the result, or an implementation may
supply different kinds of structures, some modifiable and some not modifiable.
A hash %3cons%1 was proposed for LISP 1.75 on the IBM#M44, but the
implementation was never
completed. A limited version of hash %3cons%1ing was implemented as
an extension of LISP 1.6 at Stanford.
Impressive and extensive applications of hashing appear
in HLISP (⊗↑[Got#74]↑, ⊗↑[Ter#75]↑). That implementation of LISP
supplies two different kinds of structures: ordinary list structure
and "monocopy" structures. Operations are also supplied for
conversion between types. Extensive analysis of hashing effects
on algorithm performance has also been done (⊗↑[Got#76]↑).
HLISP also employs hashing in its implementation of property lists.
Property lists are not stored explicitly, but rather the atom name and the
property name are used to form a hash index; the property value is associated
with that hash index.
For example,
%3get[x;i]%1 hashes with both %3x%1 and %3i%1 to retrieve the property value.
The other major implementations of LISP also
offer specialized operations for dealing with hashed quantities;
see ⊗↑[Moo#74]↑, ⊗↑[Int#75]↑, and ⊗↑[Bob#75]↑.